Overview

Dataset statistics

Number of variables20
Number of observations338592
Missing cells478174
Missing cells (%)7.1%
Duplicate rows785
Duplicate rows (%)0.2%
Total size in memory51.7 MiB
Average record size in memory160.0 B

Variable types

Numeric13
Categorical6
Unsupported1

Warnings

Dataset has 785 (0.2%) duplicate rows Duplicates
NyusenJyuni is highly correlated with KakuteiJyuniHigh correlation
KakuteiJyuni is highly correlated with NyusenJyuniHigh correlation
DochakuKubun is highly correlated with DochakuTosuHigh correlation
DochakuTosu is highly correlated with DochakuKubunHigh correlation
Jyuni1c is highly correlated with Jyuni2cHigh correlation
Jyuni2c is highly correlated with Jyuni1cHigh correlation
Jyuni3c is highly correlated with Jyuni4cHigh correlation
Jyuni4c is highly correlated with Jyuni3cHigh correlation
DochakuKubun is highly correlated with DochakuTosuHigh correlation
DochakuTosu is highly correlated with DochakuKubunHigh correlation
ZogenFugo has 80498 (23.8%) missing values Missing
ZogenSa has 33110 (9.8%) missing values Missing
ChakusaCD has 26025 (7.7%) missing values Missing
ChakusaCDP has 338541 (> 99.9%) missing values Missing
KisyuCodeBefore is highly skewed (γ1 = 33.83524805) Skewed
ZogenSa is highly skewed (γ1 = 32.62146663) Skewed
ChakusaCDP is an unsupported type, check if it needs cleaning or further analysis Unsupported
KisyuCodeBefore has 337138 (99.6%) zeros Zeros
ZogenSa has 47122 (13.9%) zeros Zeros
IJyoCD has 335526 (99.1%) zeros Zeros
Jyuni1c has 201936 (59.6%) zeros Zeros
Jyuni2c has 182501 (53.9%) zeros Zeros
Jyuni3c has 3433 (1.0%) zeros Zeros
Jyuni4c has 3758 (1.1%) zeros Zeros

Reproduction

Analysis started2021-04-07 12:48:39.367526
Analysis finished2021-04-07 12:50:11.788253
Duration1 minute and 32.42 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

KisyuCodeBefore
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct151
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.890576269
Minimum0
Maximum5538
Zeros337138
Zeros (%)99.6%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5538
Range5538
Interquartile range (IQR)0

Descriptive statistics

Standard deviation118.01475
Coefficient of variation (CV)20.03449995
Kurtosis1416.406767
Mean5.890576269
Median Absolute Deviation (MAD)0
Skewness33.83524805
Sum1994502
Variance13927.48121
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0337138
99.6%
107570
 
< 0.1%
111647
 
< 0.1%
64146
 
< 0.1%
108545
 
< 0.1%
109544
 
< 0.1%
117436
 
< 0.1%
101435
 
< 0.1%
112231
 
< 0.1%
104330
 
< 0.1%
Other values (141)1070
 
0.3%
ValueCountFrequency (%)
0337138
99.6%
6294
 
< 0.1%
6355
 
< 0.1%
64146
 
< 0.1%
6527
 
< 0.1%
ValueCountFrequency (%)
55381
 
< 0.1%
55297
< 0.1%
54985
< 0.1%
54543
 
< 0.1%
53869
< 0.1%

MinaraiCD
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
294463 
3
 
22431
1
 
12421
2
 
8864
9
 
413

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0294463
87.0%
322431
 
6.6%
112421
 
3.7%
28864
 
2.6%
9413
 
0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0294463
87.0%
322431
 
6.6%
112421
 
3.7%
28864
 
2.6%
9413
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0294463
87.0%
322431
 
6.6%
112421
 
3.7%
28864
 
2.6%
9413
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
0294463
87.0%
322431
 
6.6%
112421
 
3.7%
28864
 
2.6%
9413
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
0294463
87.0%
322431
 
6.6%
112421
 
3.7%
28864
 
2.6%
9413
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
0294463
87.0%
322431
 
6.6%
112421
 
3.7%
28864
 
2.6%
9413
 
0.1%

MinaraiCDBefore
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
338407 
2
 
67
1
 
55
3
 
54
9
 
9

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0338407
99.9%
267
 
< 0.1%
155
 
< 0.1%
354
 
< 0.1%
99
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0338407
99.9%
267
 
< 0.1%
155
 
< 0.1%
354
 
< 0.1%
99
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0338407
99.9%
267
 
< 0.1%
155
 
< 0.1%
354
 
< 0.1%
99
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
0338407
99.9%
267
 
< 0.1%
155
 
< 0.1%
354
 
< 0.1%
99
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
0338407
99.9%
267
 
< 0.1%
155
 
< 0.1%
354
 
< 0.1%
99
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
0338407
99.9%
267
 
< 0.1%
155
 
< 0.1%
354
 
< 0.1%
99
 
< 0.1%

BaTaijyu
Real number (ℝ≥0)

Distinct154
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean469.9620783
Minimum0
Maximum999
Zeros669
Zeros (%)0.2%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile422
Q1450
median470
Q3490
95-th percentile522
Maximum999
Range999
Interquartile range (IQR)40

Descriptive statistics

Standard deviation36.75072664
Coefficient of variation (CV)0.07819934488
Kurtosis51.4958348
Mean469.9620783
Median Absolute Deviation (MAD)20
Skewness-4.023421006
Sum159125400
Variance1350.615909
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4709260
 
2.7%
4688948
 
2.6%
4728932
 
2.6%
4748869
 
2.6%
4608755
 
2.6%
4808733
 
2.6%
4768712
 
2.6%
4788712
 
2.6%
4668660
 
2.6%
4648612
 
2.5%
Other values (144)250399
74.0%
ValueCountFrequency (%)
0669
0.2%
3302
 
< 0.1%
3341
 
< 0.1%
3364
 
< 0.1%
3383
 
< 0.1%
ValueCountFrequency (%)
9992
< 0.1%
6401
< 0.1%
6381
< 0.1%
6362
< 0.1%
6342
< 0.1%

ZogenFugo
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing80498
Missing (%)23.8%
Memory size2.6 MiB
+
130480 
-
127614 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters258094
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-
2nd row-
3rd row+
4th row+
5th row-
ValueCountFrequency (%)
+130480
38.5%
-127614
37.7%
(Missing)80498
23.8%
Histogram of lengths of the category
ValueCountFrequency (%)
258094
100.0%

Most occurring characters

ValueCountFrequency (%)
+130480
50.6%
-127614
49.4%

Most occurring categories

ValueCountFrequency (%)
Math Symbol130480
50.6%
Dash Punctuation127614
49.4%

Most frequent character per category

ValueCountFrequency (%)
-127614
100.0%
ValueCountFrequency (%)
+130480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common258094
100.0%

Most frequent character per script

ValueCountFrequency (%)
+130480
50.6%
-127614
49.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII258094
100.0%

Most frequent character per block

ValueCountFrequency (%)
+130480
50.6%
-127614
49.4%

ZogenSa
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct47
Distinct (%)< 0.1%
Missing33110
Missing (%)9.8%
Infinite0
Infinite (%)0.0%
Mean5.856109362
Minimum0
Maximum999
Zeros47122
Zeros (%)13.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q36
95-th percentile14
Maximum999
Range999
Interquartile range (IQR)4

Descriptive statistics

Standard deviation29.68229784
Coefficient of variation (CV)5.068603744
Kurtosis1088.355326
Mean5.856109362
Median Absolute Deviation (MAD)2
Skewness32.62146663
Sum1788936
Variance881.0388048
MonotocityNot monotonic
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
277179
22.8%
460828
18.0%
047122
13.9%
642280
12.5%
827494
 
8.1%
1017706
 
5.2%
1210817
 
3.2%
146695
 
2.0%
164159
 
1.2%
182845
 
0.8%
Other values (37)8357
 
2.5%
(Missing)33110
9.8%
ValueCountFrequency (%)
047122
13.9%
1714
 
0.2%
277179
22.8%
3616
 
0.2%
460828
18.0%
ValueCountFrequency (%)
999266
0.1%
543
 
< 0.1%
484
 
< 0.1%
463
 
< 0.1%
448
 
< 0.1%

IJyoCD
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02952225688
Minimum0
Maximum7
Zeros335526
Zeros (%)99.1%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3320090787
Coefficient of variation (CV)11.24606022
Kurtosis150.1750672
Mean0.02952225688
Median Absolute Deviation (MAD)0
Skewness11.98629024
Sum9996
Variance0.1102300283
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0335526
99.1%
41803
 
0.5%
1626
 
0.2%
3568
 
0.2%
754
 
< 0.1%
514
 
< 0.1%
61
 
< 0.1%
ValueCountFrequency (%)
0335526
99.1%
1626
 
0.2%
3568
 
0.2%
41803
 
0.5%
514
 
< 0.1%
ValueCountFrequency (%)
754
 
< 0.1%
61
 
< 0.1%
514
 
< 0.1%
41803
0.5%
3568
 
0.2%

NyusenJyuni
Real number (ℝ≥0)

HIGH CORRELATION

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.93107634
Minimum0
Maximum18
Zeros2997
Zeros (%)0.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile1
Q14
median8
Q312
95-th percentile15
Maximum18
Range18
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.531754262
Coefficient of variation (CV)0.571392087
Kurtosis-1.02460413
Mean7.93107634
Median Absolute Deviation (MAD)4
Skewness0.1565686563
Sum2685399
Variance20.53679669
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
723280
 
6.9%
523062
 
6.8%
823042
 
6.8%
123003
 
6.8%
623002
 
6.8%
422898
 
6.8%
322776
 
6.7%
922768
 
6.7%
222644
 
6.7%
1022186
 
6.6%
Other values (9)109931
32.5%
ValueCountFrequency (%)
02997
 
0.9%
123003
6.8%
222644
6.7%
322776
6.7%
422898
6.8%
ValueCountFrequency (%)
182134
 
0.6%
172723
 
0.8%
1611624
3.4%
1514505
4.3%
1416571
4.9%

KakuteiJyuni
Real number (ℝ≥0)

HIGH CORRELATION

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.930529959
Minimum0
Maximum18
Zeros3011
Zeros (%)0.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile1
Q14
median8
Q312
95-th percentile15
Maximum18
Range18
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.531876587
Coefficient of variation (CV)0.571446878
Kurtosis-1.024555501
Mean7.930529959
Median Absolute Deviation (MAD)4
Skewness0.1565859261
Sum2685214
Variance20.5379054
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
723278
 
6.9%
523060
 
6.8%
823040
 
6.8%
623003
 
6.8%
123001
 
6.8%
422898
 
6.8%
322779
 
6.7%
922768
 
6.7%
222645
 
6.7%
1022186
 
6.6%
Other values (9)109934
32.5%
ValueCountFrequency (%)
03011
 
0.9%
123001
6.8%
222645
6.7%
322779
6.7%
422898
6.8%
ValueCountFrequency (%)
182135
 
0.6%
172718
 
0.8%
1611626
3.4%
1514504
4.3%
1416565
4.9%

DochakuKubun
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
337160 
1
 
1432

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0337160
99.6%
11432
 
0.4%
Histogram of lengths of the category
ValueCountFrequency (%)
0337160
99.6%
11432
 
0.4%

Most occurring characters

ValueCountFrequency (%)
0337160
99.6%
11432
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
0337160
99.6%
11432
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
0337160
99.6%
11432
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
0337160
99.6%
11432
 
0.4%

DochakuTosu
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
337160 
1
 
1424
2
 
8

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0337160
99.6%
11424
 
0.4%
28
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0337160
99.6%
11424
 
0.4%
28
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0337160
99.6%
11424
 
0.4%
28
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
0337160
99.6%
11424
 
0.4%
28
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
0337160
99.6%
11424
 
0.4%
28
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
0337160
99.6%
11424
 
0.4%
28
 
< 0.1%

Time
Real number (ℝ≥0)

Distinct2097
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1503.568061
Minimum0
Maximum5181
Zeros2997
Zeros (%)0.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile1091
Q11214
median1384
Q31567
95-th percentile2319
Maximum5181
Range5181
Interquartile range (IQR)353

Descriptive statistics

Standard deviation503.391801
Coefficient of variation (CV)0.3347981471
Kurtosis5.026741454
Mean1503.568061
Median Absolute Deviation (MAD)179
Skewness1.597273709
Sum509096117
Variance253403.3053
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02997
 
0.9%
11211148
 
0.3%
11221127
 
0.3%
11271122
 
0.3%
11301117
 
0.3%
11241114
 
0.3%
11311107
 
0.3%
11231103
 
0.3%
11201093
 
0.3%
11281089
 
0.3%
Other values (2087)325575
96.2%
ValueCountFrequency (%)
02997
0.9%
5381
 
< 0.1%
5401
 
< 0.1%
5428
 
< 0.1%
5436
 
< 0.1%
ValueCountFrequency (%)
51811
< 0.1%
51601
< 0.1%
51251
< 0.1%
51211
< 0.1%
51121
< 0.1%

ChakusaCD
Categorical

MISSING

Distinct22
Distinct (%)< 0.1%
Missing26025
Missing (%)7.7%
Memory size2.6 MiB
K
59312 
12
36970 
114
29528 
34
26410 
H
23727 
Other values (17)
136620 

Length

Max length3
Median length1
Mean length1.756932114
Min length1

Characters and Unicode

Total characters549159
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowH
2nd row134
3rd row212
4th rowH
5th rowK
ValueCountFrequency (%)
K59312
17.5%
1236970
10.9%
11429528
8.7%
3426410
7.8%
H23727
 
7.0%
21217661
 
5.2%
13416320
 
4.8%
A14448
 
4.3%
11214360
 
4.2%
212469
 
3.7%
Other values (12)61362
18.1%
(Missing)26025
7.7%
Histogram of lengths of the category
ValueCountFrequency (%)
k59312
19.0%
1236970
11.8%
11429528
9.4%
3426410
8.4%
h23727
 
7.6%
21217661
 
5.7%
13416320
 
5.2%
a14448
 
4.6%
11214360
 
4.6%
212469
 
4.0%
Other values (12)61362
19.6%

Most occurring characters

ValueCountFrequency (%)
1176868
32.2%
2107858
19.6%
479070
14.4%
361977
 
11.3%
K59312
 
10.8%
H23727
 
4.3%
A14448
 
2.6%
57693
 
1.4%
T6905
 
1.3%
73566
 
0.6%
Other values (5)7735
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number443191
80.7%
Uppercase Letter105968
 
19.3%

Most frequent character per category

ValueCountFrequency (%)
1176868
39.9%
2107858
24.3%
479070
17.8%
361977
 
14.0%
57693
 
1.7%
73566
 
0.8%
62701
 
0.6%
81792
 
0.4%
91666
 
0.4%
ValueCountFrequency (%)
K59312
56.0%
H23727
22.4%
A14448
 
13.6%
T6905
 
6.5%
Z866
 
0.8%
D710
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common443191
80.7%
Latin105968
 
19.3%

Most frequent character per script

ValueCountFrequency (%)
1176868
39.9%
2107858
24.3%
479070
17.8%
361977
 
14.0%
57693
 
1.7%
73566
 
0.8%
62701
 
0.6%
81792
 
0.4%
91666
 
0.4%
ValueCountFrequency (%)
K59312
56.0%
H23727
22.4%
A14448
 
13.6%
T6905
 
6.5%
Z866
 
0.8%
D710
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII549159
100.0%

Most frequent character per block

ValueCountFrequency (%)
1176868
32.2%
2107858
19.6%
479070
14.4%
361977
 
11.3%
K59312
 
10.8%
H23727
 
4.3%
A14448
 
2.6%
57693
 
1.4%
T6905
 
1.3%
73566
 
0.6%
Other values (5)7735
 
1.4%

ChakusaCDP
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing338541
Missing (%)> 99.9%
Memory size2.6 MiB

Jyuni1c
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.947296451
Minimum0
Maximum18
Zeros201936
Zeros (%)59.6%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q35
95-th percentile13
Maximum18
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.508988032
Coefficient of variation (CV)1.529872582
Kurtosis0.6206030235
Mean2.947296451
Median Absolute Deviation (MAD)0
Skewness1.37122161
Sum997931
Variance20.33097308
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
0201936
59.6%
212511
 
3.7%
312030
 
3.6%
19897
 
2.9%
59867
 
2.9%
49834
 
2.9%
79768
 
2.9%
69754
 
2.9%
89667
 
2.9%
99219
 
2.7%
Other values (9)44109
 
13.0%
ValueCountFrequency (%)
0201936
59.6%
19897
 
2.9%
212511
 
3.7%
312030
 
3.6%
49834
 
2.9%
ValueCountFrequency (%)
18374
 
0.1%
17517
 
0.2%
163022
0.9%
154292
1.3%
145405
1.6%

Jyuni2c
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.364113742
Minimum0
Maximum18
Zeros182501
Zeros (%)53.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q36
95-th percentile13
Maximum18
Range18
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.668570699
Coefficient of variation (CV)1.387756496
Kurtosis0.1086670373
Mean3.364113742
Median Absolute Deviation (MAD)0
Skewness1.177530478
Sum1139062
Variance21.79555237
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
0182501
53.9%
314220
 
4.2%
214208
 
4.2%
511926
 
3.5%
711378
 
3.4%
111304
 
3.3%
610879
 
3.2%
810816
 
3.2%
910715
 
3.2%
410471
 
3.1%
Other values (9)50174
 
14.8%
ValueCountFrequency (%)
0182501
53.9%
111304
 
3.3%
214208
 
4.2%
314220
 
4.2%
410471
 
3.1%
ValueCountFrequency (%)
18423
 
0.1%
17601
 
0.2%
163503
1.0%
154899
1.4%
146225
1.8%

Jyuni3c
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.42868408
Minimum0
Maximum18
Zeros3433
Zeros (%)1.0%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median7
Q311
95-th percentile15
Maximum18
Range18
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.484743762
Coefficient of variation (CV)0.6037063515
Kurtosis-0.9927566117
Mean7.42868408
Median Absolute Deviation (MAD)4
Skewness0.277582547
Sum2515293
Variance20.11292661
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
231569
 
9.3%
327438
 
8.1%
424434
 
7.2%
724344
 
7.2%
523565
 
7.0%
123230
 
6.9%
822506
 
6.6%
621536
 
6.4%
921443
 
6.3%
1021038
 
6.2%
Other values (9)97489
28.8%
ValueCountFrequency (%)
03433
 
1.0%
123230
6.9%
231569
9.3%
327438
8.1%
424434
7.2%
ValueCountFrequency (%)
181581
 
0.5%
172186
 
0.6%
169087
2.7%
1512155
3.6%
1414654
4.3%

Jyuni4c
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.406710731
Minimum0
Maximum18
Zeros3758
Zeros (%)1.1%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median7
Q311
95-th percentile15
Maximum18
Range18
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.485211088
Coefficient of variation (CV)0.6055604506
Kurtosis-0.9919008073
Mean7.406710731
Median Absolute Deviation (MAD)4
Skewness0.2772845513
Sum2507853
Variance20.11711851
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
232733
 
9.7%
326597
 
7.9%
724376
 
7.2%
424360
 
7.2%
123156
 
6.8%
523141
 
6.8%
822563
 
6.7%
621940
 
6.5%
1021245
 
6.3%
921162
 
6.2%
Other values (9)97319
28.7%
ValueCountFrequency (%)
03758
 
1.1%
123156
6.8%
232733
9.7%
326597
7.9%
424360
7.2%
ValueCountFrequency (%)
181543
 
0.5%
172097
 
0.6%
169043
2.7%
1512193
3.6%
1414309
4.2%

Odds
Real number (ℝ≥0)

Distinct6366
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean714.9489149
Minimum0
Maximum9999
Zeros1194
Zeros (%)0.4%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile29
Q189
median272
Q3904
95-th percentile2917
Maximum9999
Range9999
Interquartile range (IQR)815

Descriptive statistics

Standard deviation1012.403294
Coefficient of variation (CV)1.416049837
Kurtosis7.054345767
Mean714.9489149
Median Absolute Deviation (MAD)222
Skewness2.417226804
Sum242075983
Variance1024960.43
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
381412
 
0.4%
461412
 
0.4%
431366
 
0.4%
421355
 
0.4%
351352
 
0.4%
371349
 
0.4%
361346
 
0.4%
341345
 
0.4%
411339
 
0.4%
391338
 
0.4%
Other values (6356)324978
96.0%
ValueCountFrequency (%)
01194
0.4%
1168
 
< 0.1%
12134
 
< 0.1%
13250
 
0.1%
14445
 
0.1%
ValueCountFrequency (%)
99993
< 0.1%
94451
 
< 0.1%
93321
 
< 0.1%
93271
 
< 0.1%
91071
 
< 0.1%

Ninki
Real number (ℝ≥0)

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.040130895
Minimum0
Maximum18
Zeros1194
Zeros (%)0.4%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile1
Q14
median8
Q312
95-th percentile16
Maximum18
Range18
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.515613858
Coefficient of variation (CV)0.561634371
Kurtosis-1.033741771
Mean8.040130895
Median Absolute Deviation (MAD)4
Skewness0.1487470144
Sum2722324
Variance20.39076851
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
723205
 
6.9%
823183
 
6.8%
623049
 
6.8%
523017
 
6.8%
922931
 
6.8%
422887
 
6.8%
322587
 
6.7%
222444
 
6.6%
122398
 
6.6%
1022337
 
6.6%
Other values (9)110554
32.7%
ValueCountFrequency (%)
01194
 
0.4%
122398
6.6%
222444
6.6%
322587
6.7%
422887
6.8%
ValueCountFrequency (%)
182273
 
0.7%
172766
 
0.8%
1612228
3.6%
1514831
4.4%
1417031
5.0%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

KisyuCodeBeforeMinaraiCDMinaraiCDBeforeBaTaijyuZogenFugoZogenSaIJyoCDNyusenJyuniKakuteiJyuniDochakuKubunDochakuTosuTimeChakusaCDChakusaCDPJyuni1cJyuni2cJyuni3cJyuni4cOddsNinki
0000502.00-18.00077001096HNaN0033614
1000498.00-4.00099001108134NaN0035805
2000520.00+20.00055001105212NaN0033583
3000500.00NaN0.0001111001108HNaN0033624
4000474.00+4.00033001095KNaN0086462
5000480.00NaN0.00033001094112NaN0098301
6000474.00-6.0001111001103HNaN001010221
7000476.00-6.00011001083NaNNaN0011181
8000480.00+4.00044001101KNaN001210301
9000480.00-2.00022001098KNaN0013121003

Last rows

KisyuCodeBeforeMinaraiCDMinaraiCDBeforeBaTaijyuZogenFugoZogenSaIJyoCDNyusenJyuniKakuteiJyuniDochakuKubunDochakuTosuTimeChakusaCDChakusaCDPJyuni1cJyuni2cJyuni3cJyuni4cOddsNinki
338582000458.00-15.00033001098KNaN001074145
338583000418.00-16.0001717001132134NaN00811290216
338584000422.00+10.0001515001101112NaN003530910
338585030452.00NaNnan088002017312NaN1091313116417
338586000482.00-4.0004400589KNaN00441627
338587000436.00-2.00011110011042NaN00910296011
338588000524.00-4.00010100014752NaN233347610
338589000480.00+2.0001717001147TNaN001717269412
338590000466.00-22.0009900203112NaN109911137613
338591000470.00NaNnan01616001552212NaN14141416308016